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(S) Data analysis and event prediction in a telecommunications network. 



@ A facility is provided for enhancing an oper- 
ations support systenn so that, based on data 
generated as a result of an event occurring in an 
associated telecommunications network, the 
operations support system can predict the 
likelihood of the event occunring again in the 
networic. 
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Field of the Invention 

The invention relates to processing data generated by a telecommunications network and applying the re- 
sult of such processing to other network data to predict the possibility of a particular event happening in the 
network. 

Background of the Invention 

A large telecommunications network typically generates a large number of reports over a short period of 
time, e.g.. twenty-four hours. Such reports include data relating to, for example, telephone traffic, circuit 
alarms, etc. A Network Management System (NMS) associated with the telecommunications network accu- 
mulates and processes the data to track and manage the performance of the network. Such management in- 
cludes (a) identifying and repairing network problems when they occur, and (b) trends in telephone traffic pat- 
terns. For example, consider the way in which an NMS processes trouble alarms, which may be classified as 
either transient or non-transient. Non-transient alarms are usually indicative of a "hard" fault. Transient alarms, 
on the other hand, are usually indicative of either a reoccurring and nonreoccurring network problem. A non- 
reoccurring alarm is usually the result of an inconsequential fault and, therefore, may be ignored. The reason 
for this is that a nonreoccumng alarm is rrtosX likely the result of some external condition, such as, for example, 
a strike of lightning, causing a momentary loss of a carrier signal at a point in the network. A reoccurring tran- 
sient alarm, on the other hand, is usually indicative of a chronic problem, for example, a network facility or 
circuit that fails when particular conditions occur. Such a problem might affect the quality of telecommunica- 
tions when it occurs. 

In most instances, however, it is difficult to distinguish a chronic alarm from a transient, non-chronic alarm. 
Moreover, the majority of alarms that are generated in a telecommunications network are of the nonreoccumng 
type. Because of this, an NMS may spend an inordinate amount of time and resources attempting to identify 
the sources of respective nonreoccurring alarms, thereby possibly delaying the repair of chronic problems. 

Summary of the Invention 

An advancement in the art of telecommunications is achieved by arranging an operations support system 
so that, in accord with an aspect of the invention, it generates a rule set from data generated by an associated 
telecommunications system and then applies the rule set to data subsequently generated by the telecommu- 
nications system to predict that a particular event(s) will likely occur inthe system. In an illustrative embodiment 
of the invention, a set of predetermined measurements is generated for the data that is initially obtained from 
the telecommunications system and then supplied to a particular rule induction method which then generates 
a number of rules as a function of such measurements and selects a best rule set from the generated rules 
such that the selected rule set is capable of predicting accurately the occun-ence of past events characterized 
by the data. The rule set is then used to predict future occurrences of the events. 

Brief Description of the Drawing 

IntheFIGs: 

FIG. 1 shows a broad block diagram of an operations system; and 

FIG. 2 shows in flow chart form the program which implements the invention in the rule generator of FIG. 
Detailed Description 

Turning now to FIG. 1 there is shown a broad block diagram of a conventional telecommunications network 
200 and operations support system 300. Operations support system 300. more particularly, supports the over- 
all operation of network 200. One aspect of such support deals with tracking the performance of network 200 
by accumulating and processing alarm messages generated by individual facilities comprising network 200. 
An alarm message may be generated by a facility as a result of one of a number of different conditions. Such 
conditions include (a) a spurious problem that occurs for just a short period of time, (b) a chronic problem that 
occurs periodically, but which can cause an associated facility to slowly degrade over time, (c) a "hard" failure 
which may cause the associated facility to become inoperable, etc. The number of such alarms that network 
200 may generate within a particular period of time, e.g.. 24 hours, could be very high. Such alarms are ac- 
cumulated by system 300 and stored in database 30. An alanm message includes information identifying a par- 
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ticular problem and the identity of the network 200 facilities that might be affected by the problem. 

The alarm messages that are stored in database ,30 may be access d by facility type. For example, all 
alamri messages associated with a particular type of facility, such as echo cancelers, may be accessed by pre- 
senting a request identifying the particular type of facility type to database 30. Such access may be refined 

5 by also specifying a particular period of time in the request In response to the request, database 30 outputs 
an alarm message that occurred over the specified period of time for the identified type of facility. 

One dilemma that an operations support system faces is which of the large number of alarms that are gen- 
erated on a daily basis should be addressed first. It is apparent that alarms that are generated as a result of 
a "hard" fault are usually easy to identify and are addressed first. However, the majority of alarms that are 

10 generated daily are typically due to spurious conditions and chronic problems. Moreover, it is difficult to de- 
termine initially whether an alarm occurred as a result of a spurious condition or a chronic problem. As such, 
much effort is sometimes directed to tracking down the cause of spurious alarms since they comprise the ma- 
jority of alarms that occur each day. Accordingly, alarms identifying a chronic problem may not be dealt with 
until the facility generating such alarms degrades to a hard fault 

15 Recognizing that problem, we have adapted an operations support system so that it processes alarms to 

identify chronic problems and prioritize the repair of those problems. To that end, Operations Support System 
(OSS) 300 includes rule generator 40 and operations system processor 50 which operate in accordance with 
the invention to identify and prioritize chronic problems. More particularly, rule generator 40, which may be, 
for example, embodied in a SPARC station 2 available from Sun Microsystems, Inc.. responds to receipt of a 

20 command from an external source, e.g., terminal T2, by sending an access request identifying a facility type 
and period of time to database 30. As mentioned above, database 30, in response to such a request, outputs 
all alarm messages that occurred over the specified period of time for the type of facility identified in the re- 
quest, in which the facility type may be. for example, echo cancelers, T1 canriers, T3 carriers, toll switches, 
etc. The specified period of time is typically one to several weeks and may be identified by a starting date and 

25 an ending date. 

Assuming that the requested period of time equals two weeks and the facility type is T1 carriers, then th 
number of alarm messages of the requested type that database 30 outputs could be very large if network 200 
employs an appreciable number of T1 carriers. Upon receipt of the alarm messages, generator 40 stores them 
in its associated memory 41 such that alarm messages associated with a specific facility are stored in a re- 

30 spective memory array. When alt such messages have been received from database 30 and stored in memory 
41 . then generator 40 processes the messages and generates, in accord with an aspect of the invention, a set 
of rules which may be applied to identify which of the alarms stored in memory 41 are indicative of respective 
chronic problems. The rules are then passed to operations system processor 50. which then applies thejules 
to alarm messages associated with the same type of facility, but occurring thereafter.' Such rules are also 

35 passed to processor 60, as discussed below. 

The program which controls the operation of processor 40 to generate a particular set of rules for the proc- 
essing of alarm messages is shown in flow chart form in FIG. 2. Specifically, the program is entered at block 
2000 responsive to receipt of the aforementioned command. At block 2000. the program proceeds to block 2001 
where it forms a request message containing the identity of the facility type and particular period of time, as 

40 discussed above. The program then supplies the message to database 30. As mentioned above, database 30. 
in turn, (a) unloads the alarm messages from its associated memory for the specif led facility type and for the 
specified period of time and (b) passes them to processor 40. Processor 40, in turn and under control of the 
program, stores the messages in associated memory 41 , as mentioned above. When such alarm messages 
have been so stored, the program then proceeds to block 2002. (Hereinafter the term "alarm" may also be re- 

45 ferred to as an event.) 

At block 2002, the program processes the stored messages to obtain a number of different performance 
measurements for each facility of the specified type. The measurements summarize spatial and temporal in- 
formation about a facility and associated problems. Such measurements may include the number of (a) dif- 
ferent types of alarm messages that occurred for the specific facility, (b) times an alarm (event) occurred during 

50 a unit of time for the facility, (c) units of time or (d) the location of the facility, during which the alarm (event) 
occurred. As will be discussed below, such measurements are determined for first and second windows, W1 
and W2, comprising the aforementioned specified period of time. In addition, each window is partitioned into 
units of time for the purpose of performing such measurements. (An illustrative list of such performance meas- 
urements, or features, is shown in Appendix A.) 

55 Specifically, blocks 2002, 2003 and 2004 constitute a "do" loop for processing alarms associated with the 

identified facility. Once It identifies such messages, then the program generates the different performance 
measurements for each window, W1 and W2, and for the units of time associated with those windows. In ad- 
dition, the program accesses database 20 to determine the circuit layout for the facility within network 200. 

3 



BNSDOCID: <EP 0650302A2_I_> 



EP 0 650 302 A2 



The circuit layout includes the Identities of the other network 200 elements that may be connected directly to 
the facility. The program then performs the same processing steps for a next facility of the sp cif led type The 

^ ^"^ proceeds to block 2005 upon processing the alarm messages for all of the fa- 

cilities of the specified type. 

At block 2005 the program builds a data file, in which a record of the data file comprises a number of fields 
for entenng respective performance measurements derived for a specific facility and the circuit layout for the 
acilily The data file also includes a header which identifies the fields forming a record and identifies the con- 
M ! ™ L proceeds to block 2006 once it has completed the building of the data file At 

block 2006. the program enters a rule generation program, which processes the data file using rule induction 
methods to generate a set of rules for the specified type of facility. For example, a set of rules of the form "if 
A is > B. then choose class 1", where A may represent a performance measurement. B may represent a par- 
ticular value and class 1 may represent a chronic problem. 

(It is noted that the rule induction program represented by block 2006 follows the rule induction technique 
disclosed m an article entitled "Reduced Complexity Rule Induction", by S. Weiss and N. Indurkhya and pub- 
lished in the Proceedinqs of the International Joint Conference on Artificial Intelligence (IJCAI). Sydney Aus- 
ralia. 1991. pages 678-684. which Is hereby Incorporated by reference. Alternatively, a number of rule induc- 
tion programs are commercially available, such as the 04.5 program available from Morgan Kaufmann Pub- 
lishers. San Mateo, California, U.S.A.). 

Briefly, rule induction methods attempt to find a compact rule set that completely covers and separates 
various classes of data. e.g. alarm messages respectively relating to spurious and chronic problems. Acovering 
set of rules is found by heuristically searching for a single best rule that covers different cases, e.g. records 
m the data file, for only one class. Once a best conjunctive rule is found for a particular dass e.g.. chronic or 
nonchronic problems, the rule is then added to the rule set and the cases satisfying It are removed from further 
consideration. This process is repeated until there are no remaining cases to be covered. Once a covering rule 
set is found that separates the classes, the induced set of rules is further refined by using either pruning or 
statistical techniques. The initial covering rule set is then scaled back to the most statistically accurate subset 
of rules using training and testing evaluation methods. 

When theprogram has generated a numberof sets of rules, it applies those rule sets to the test data portion 
of the data file generated at block 2005. In doing so, Jt prunes the rule sets so that an error rate defining the 
accuracy of each rule set may be minimized when applied to the test data. That is. the program strives to im- 
prove the accuracy of a rule set to predict the occurrence of chronic as well as nonchronic alarms. Following 
the foregoing, the program then proceeds to block 2007: 

The program at block 2007 selects a rule set from the sets of rules generated at block 2006. in which the 
selection is based on a specification provided by an external source, for example, a user or another program 
Such a specification may request a rule set having a particular predictive accuracy for a certain dass of prob- 
^ms, such as chronic problems. The specification may also include coveragefor a particular class of problems 
Coverage as defined herein is a measure. of the range of particular problems, e.g., chronic problems, that a 
rule set can predict accurately. For example, the following illustrates one possible rule set 
ALRD > Xi . . 

FBL>X2 
SLL>X3 
0FL>X4 0r 

BVL>.Xs - . 

where x, through Xg are particular values that are generated by the rule generator. The coverage of the above 
rule set may be increased by adding, for example, the following rule to form a rule set of 2 

CT2 & ALRCT > xe & ALRH > xy; 
where xe and Xy are also values generated by the rule generator. 

It is noted, however, that increasing the coverage of a rule set may lead to a decrease in the predictive 
accuracy of the rule set. The reason for this is that as the number of cases that are covered increases it is 
likely that en-ors will occur. 

When the program at block 2007 selects a rule set in accordance with the externally provided specification 
It then proceeds to block 2008. where it outputs the rule set to processor 50 and processor 60. The program 
then proceeds to block 2009 where it checks to see it is done, that is, if the program has been directed to gen- 
erate a rule set for another type of facility. If that, is the case, then the program proceeds to block 2010 where 
It obtains the identity for the specified facility and then returns to block 2001. Otherwise, the program exits 

Returning to FIG. 1. operations system processor 50 applies the rule set to new alarm messages that it 
obtains from communications network 200 so that it can accurately predict the occurrence of a particular chron- 
ic problem and output a message to that effect via path 51 . In this way. processor 50 may direct attention away 
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from spurious problems (nonchronic problems) and focus attention on chronic problems so that such problems 
may be dealt with before they degrade into a hard fault. 

The rule set outpulted by processor 40 may also be used to predict trends In the occurrence of chronic 
problems. Quality control processor 60 uses the rule set to determine such trends, and uses the trends in con- 
5 junction with other quality control criteria, e.g., the rate at which chronic problems are being repaired, to eval- 
uate the performance of network 200. For example, if a trend indicates that a particular section of network 
200 may exceed its number of allowable chronic problems for given peiriod of time, then processor 60 uses 
that trend to output a performance value for that section of network 200. 

As mentioned above, rule induction can be applied to other applications as a way of generating rule sets 
10 for those applications. For example, one application may be traffic data in which rule induction may be applied 
to such data to generate a set of rules that may be used to predict particular trends in future traffic patterns. 
In particular, traffic data may be collected over period of time and stored in a database, e.g., database 30. The 
above-described program at block 2001 (FIG. 2) may then access that data as well as data indicative of the 
topology of the network in the manner discussed above. The program may then divide the traffic data into re- 
ts spective windows W1 and W2. and then generate a rule set for the data. The measurements made in W1 are 
used to predict the measurements in W2. If the prediction is accurate, then the rule set may be used to predict 
future traffic patterns, in which such patterns may be, for example, predicting appropriate levels of blocking 
for particular telephone routes. 

As another example, when a disaster strikes a particular area, e.g., a hurricane, particular controls are 
20 activated to automatically block low priority calls so that such calls do not impact the processing and routing 
of high priority calls. The levels of such controls can be predicted accurately by processing traffic data accu- 
mulated during a previous disaster, in accord with the invention, to generate a rule set that can be used to predict 
such levels of control. 

The foregoing is merely illustrative of the principles of the invention. Those skilled in the art will be able 
25 to devise numerous arrangements, which, although not explicitly shown or described herein, nevertheless em- 
body those principles that are within the spirit and scope of the invention. 
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APPENDIX A 



Some examples of Pcrfonmnce Measurements for facilities with failuies: 
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Measurement Name 

ALRD 

ALRH 

ERHD 

ERLD 

OFH 

OFL 

FBL 

BVL 

SLL 

ALRCr 

AVGAL 

OST 

CTl 

CT2 

cr3 



Description 
Number of time units with alarms 
Number of time units with alartns duritig the last day 
Number of time units with excepnons over a threshold H 
Number of time units with exceptions over a threshold L 
Number of time units with out-of-frame exceptions over a threshold H 
Number of time units with out-of-frame exceptions over a threshold L 
Number of time units with framing bit exceptions over a threshold L 
. Number of time units with Bi-polar violations over a threshold L 
Number of time units with slips over a threshold L 
Total number of alarms 

Average number of alarms per time unit with alarm 
Seconds facility was out of service 
Type of circuit - Domestic circuit 
Type of circuit - Intemadonai circuit 
Type of circuit - Private lines. 
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Claims 



^vTnt'^?2^ P^cessing particular data to generate at least one rule set operable for predicting particular 
ZelZuZT"^"' '7 '«'«'=°'"'"""''=^t'°ns system, said particular data being date r3 bTsIS 
S^rps T '"''"'^^ Characterizing prior ones of said events, said mthod'cor^ 

^1:1::::^:::=:^^^ - ~ --^e on^es ofSdTven;: 

The method set forth in claim 1 FURTHER CHARACTPRi7Pn rv * ^t«^ . • 



The method set forth in claim 1 FURTHER CHARACTPRi^Pn im xuat * 
tbe step Of generating a Plurality of rule sTa^^^^^^^^^ 
plurality of rule sets which meets a predetermined specification. 



4. The method setforth in claims FURTHER CHARACTFRi7PniMTu AT 

indudes a predetermined level of accuracy^nZre";" ^v^'of^^erg" 

t^eTps o? ' CHARACTERIZED IN THAT said step of generating Includes 

generating a data file comprising said predetermined measurements, and 
supplying said data file to a rule induction generator to generate said plurality of rule sets. 

^' ^truTl^ ^"""m " "'^'"^ ^ ■'"''"T'^ER CHARACTERIZED IN THAT said particular data defines re- 
spective chronic problems that occurred in said telecommunications system. 

7. The method set forth in claim 1 FURTHER CHARACTERIZED IN THAT said particular data defines dif 
ferent traffic patterns within said telecommunications system. , 

t^ieTp of ' CHARACTERIZED IN THAT said step of generating includes 

Jn^fY'"^. ^^'"^ ^V^^^' ""'^ '° ^" operations system processor so that said processor mav 
use said at least one rule set to predict the occurrence of said events in said system. 

t^heTp of ' ' CHARACTERIZED IN THAT said step of generating includes 

P«nr m?"'^' H- '■"'^ ^ ^"^"'y processor so that said quality control proc- 

essor may predict trends in the occurrence of said events. 

' f^^R'THER CHARACTERIZED IN THAT said particular data is service 
P^^ioning .nformafon assodated w.h a particularservice feature provided by said telecomrnSon" 
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FIG. 2 
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